{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# QDA & LDA\n",
    "### Bayes with Different Covariance Matrix\n",
    "\n",
    "- Estimate the full covariance matrix for the classes\n",
    "\n",
    ">$\\displaystyle {\\cal{}L}_{\\boldsymbol{x}}(C_k) =  G(\\boldsymbol{x};\\mu_k, \\Sigma_k)$\n",
    "\n",
    "> Handles correlated features well\n",
    "\n",
    "- Consider binary problem with 2 classes\n",
    "\n",
    "> Taking the negative logarithm of the likelihoods we compare\n",
    "\n",
    ">$\\displaystyle (x\\!-\\!\\mu_1)^T\\,\\Sigma_1^{-1}(x\\!-\\!\\mu_1) + \\ln\\,\\lvert \\Sigma_1  \\lvert $ vs.\n",
    "\n",
    ">$\\displaystyle (x\\!-\\!\\mu_2)^T\\,\\Sigma_2^{-1}(x\\!-\\!\\mu_2) + \\ln \\, \\lvert\\Sigma_2\\lvert $\n",
    "\n",
    "> If the difference is lower than a threshold, we classify it accordingly\n",
    "\n",
    "- This is called **Quadratic Discriminant Analysis**\n",
    "\n",
    "### Bayes with Same Covariance Matrix\n",
    "\n",
    "- When $\\Sigma_1=\\Sigma_2=\\Sigma$, the quadratic terms cancel from the difference\n",
    " \n",
    ">$\\displaystyle (x\\!-\\!\\mu_1)^T\\,\\Sigma^{-1}(x\\!-\\!\\mu_1) $ \n",
    ">$\\displaystyle -\\ (x\\!-\\!\\mu_2)^T\\,\\Sigma^{-1}(x\\!-\\!\\mu_2) $\n",
    "\n",
    "- Hence this is called **Linear Discriminant Analysis**\n",
    "\n",
    "> Fewer parameters to estimate during the learning process\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example QDA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyQDA(dict):\n",
    "     \n",
    "    def fit(self,X,C):\n",
    "        for k in np.unique(C):\n",
    "            members = (C==k)\n",
    "            prior = members.sum() / float(C.size)\n",
    "            S = X[members,:] # subset of class\n",
    "            mu = S.mean(axis=0)    \n",
    "            Z = (S-mu).T # centered column vectors\n",
    "            \n",
    "            cov = Z.dot(Z.T) / (Z[0,:].size-1)\n",
    "            \n",
    "            self[k] = (mu,cov,prior)\n",
    "        print(mu.shape)\n",
    "        return self\n",
    "            \n",
    "    def predict(self,Y):\n",
    "        Cpred = -1 * np.ones(Y[:,0].size)\n",
    "        for i in range(Cpred.size):\n",
    "            d2min, kbest = 1e99, None\n",
    "            for k in self:\n",
    "                mu, cov, prior = self[k]\n",
    "                diff = (Y[i,:]-mu).T\n",
    "                d2 = diff.T.dot(np.linalg.inv(cov)).dot(diff) / 2\n",
    "                d2 += np.log(np.linalg.det(cov)) / 2 - np.log(prior) \n",
    "                if d2<d2min: d2min,kbest = d2,k\n",
    "            Cpred[i] = kbest\n",
    "        return Cpred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2,)\n",
      "Number of different estimates:0\n"
     ]
    }
   ],
   "source": [
    "# reference implementation\n",
    "from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA\n",
    "\n",
    "D = np.loadtxt('/home/akhil/machine-learning-basics/examples/Class-Train.csv', delimiter=',')\n",
    "Q = np.loadtxt('/home/akhil/machine-learning-basics/examples/Class-Query.csv', delimiter=',')\n",
    "X, C = D[:,0:2], D[:,2]\n",
    "\n",
    "Cpred = MyQDA().fit(X,C).predict(Q)\n",
    "Cskit =   QDA().fit(X,C).predict(Q)\n",
    "\n",
    "print('Number of different estimates:{:d}'.format(sum(Cpred!=Cskit))) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# QDA with Cross Validation Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 2)\n",
      "(2, 2)\n",
      "Case #0 - Number of mislabeled points out of a total 157 points : 19\n",
      "(2, 2)\n",
      "(2, 2)\n",
      "Case #1 - Number of mislabeled points out of a total 156 points : 20\n"
     ]
    }
   ],
   "source": [
    "Dc = D.copy()\n",
    "# randomize and split to D1 + D2\n",
    "np.random.seed(seed=42)\n",
    "np.random.shuffle(Dc)\n",
    "split = int(Dc[:,0].size/2)\n",
    "D1, D2 = Dc[:split,:], Dc[split:,:]\n",
    "# train on one estimate on the other\n",
    "for i,(T,Q) in enumerate([(D1,D2),(D2,D1)]):\n",
    "    X, C = T[:,0:2], T[:,2]\n",
    "    Cpred, Ctrue = MyQDA().fit(X,C).predict(Q[:,:2]), Q[:,2]\n",
    "    print (\"Case #{:d} - Number of mislabeled points out of a total {:3d} points : {:2d}\".format(i, \n",
    "                                                                     Q.shape[0],sum(Ctrue!=Cpred)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example Using sklearn package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.82352941 0.82352941 0.82352941 0.875      0.73333333 0.8\n",
      " 1.         1.         1.         0.93333333]\n",
      "0.8812254901960784\n",
      "0.09139601703590215\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import cross_val_score\n",
    "clf = QDA()\n",
    "cv = cross_val_score(clf, X,C, cv=10)\n",
    "print(cv)\n",
    "print(np.mean(cv))\n",
    "print(np.std(cv))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 196,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyLDA(dict):\n",
    "     \n",
    "    def fit(self,X,C):\n",
    "        for k in np.unique(C):\n",
    "            members = (C==k)\n",
    "            prior = members.sum() / float(C.size)\n",
    "            S = X[members,:] # subset of class\n",
    "            mu = S.mean(axis=0) \n",
    "            if k == np.unique(C)[0]: \n",
    "                Z = (S-mu).T # centered column vectors\n",
    "                cov = Z.dot(Z.T) / (Z[0,:].size-1)\n",
    "            self[k] = (mu,cov,prior)\n",
    "        return self\n",
    "\n",
    "    def predict(self,Y):\n",
    "        Cpred = -1 * np.ones(Y[:,0].size)\n",
    "        for i in range(Cpred.size):\n",
    "            d2min, kbest = 1e99, None\n",
    "            for k in self:\n",
    "                mu, cov, prior = self[k]\n",
    "                diff = (Y[i,:]-mu)\n",
    "                d2 = (diff.T.dot(np.linalg.inv(cov)).dot(diff))/2 - np.log(prior)\n",
    "                if d2<d2min: d2min,kbest = d2,k\n",
    "            Cpred[i] = kbest\n",
    "        return Cpred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of different estimates:40\n"
     ]
    }
   ],
   "source": [
    "# reference implementation\n",
    "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA\n",
    "\n",
    "D = np.loadtxt('/home/akhil/machine-learning-basics/examples/Class-Train.csv', delimiter=',')\n",
    "Q = np.loadtxt('/home/akhil/machine-learning-basics/examples/Class-Query.csv', delimiter=',')\n",
    "X, C = D[:,0:2], D[:,2]\n",
    "\n",
    "Cpred = MyLDA().fit(X,C).predict(Q)\n",
    "Cskit =   LDA().fit(X,C).predict(Q)\n",
    "print('Number of different estimates: {:d}'.format(sum(Cpred!=Cskit))) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "https://web.stanford.edu/class/stats202/content/lec9.pdf"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
